class: center, middle, inverse, title-slide .title[ # AI based chat bots in research ] .subtitle[ ## Incorporation of ChatGPT and other LLMs into statistical workflows ] .author[ ### Biometrics Team ] .institute[ ### Telethon Kids Institute ] .date[ ### 2024-05-29 ] --- class: inverse, top, center background-image: url("data:image/png;base64,#images/promo_image.png") background-size: cover # It's getting hot in here? --- <style> .bottom-align-image img { position: absolute; bottom: 0; left: 50%; transform: translateX(-50%); } .title-slide { background-image: url("images/tki_pulsing.gif"); background-image-width: 50%; background-size: 150px 150px; background-position: 100% 0%; ## just start changing this } </style> # It's getting hot in here? <img src="data:image/png;base64,#images/weather_p1.jpeg" width="100%" /> --- # It's getting hot in here? <img src="data:image/png;base64,#images/weather_p2.png" width="60%" style="display: block; margin: auto;" /> --- # It's getting hot in here? <img src="data:image/png;base64,#images/weather_p3.png" width="33%" style="display: block; margin: auto;" /> --- # It's getting hot in here? <img src="data:image/png;base64,#images/weather_p3_5.png" width="65%" style="display: block; margin: auto;" /> ... 7 words --- # It's getting hot in here? <img src="data:image/png;base64,#images/weather_p4.png" width="45%" style="display: block; margin: auto;" /> --- # It's getting hot in here? <img src="data:image/png;base64,#images/weather_p5.png" width="110%" style="display: block; margin: auto;" /> --- # Who are we? The **Biometrics Team** (Matt, Bethy, Wes, Zac) - The Institute's Statistics Team - Providing statistical support via consultation - Study design guidance - protocol/grant/ethics applications support - Analysis capacity - independent/outsourced analysis - Many other things related to analysis - Arrange a chat, about anything! - biostatistics@telethonkids.org.au --- # Who are we? Hacky Hour - Wednesdays - 10am to 11am in the Manda - Stop by, with any questions (coding, statistical, life advice) <img src="data:image/png;base64,#images/bm_hacky.png" width="50%" style="display: block; margin: auto;" /> --- # Who are we? Viva Engage - Ongoing dialogue around statistics in academic research - Repository for useful analytical resources - **ALL are welcome** to post! - General things of interest related to analysis <img src="data:image/png;base64,#images/bm_viva_engage.png" width="65%" style="display: block; margin: auto;" /> --- # Session Overview This is an informal session, we do not have all the answers, please asks questions/add your perspective whenever relevant. -- *Topics:* 1. Risks and data governance 2. More powerful than you think... 3. Modelling 4. Use and trust levels 5. AI in other aspects of your research 6. Discussion --- # Session Overview <img src="data:image/png;base64,#images/cant_be.jpeg" width="60%" style="display: block; margin: auto;" /> --- # Some language LLM is a term you will may be familiar with. Large Language Model. This refers to a model trained on a large value of text (generally from a variety of sources) for the **purpose of _generating further text_**, typically in response to user prompts. Also commonly referred to as AI, or sometimes chat bots. -- There are many implementations of these out there (being constantly updated), with more being released daily. OpenAI's ChatGPT may be the most well known. We'll use chat bot/AI/LLM interchangeably. <div class="bottom-align-image"> <img src="data:image/png;base64,#images/other_ai.png" width="65%" style="display: block; margin: auto;" /> </div> --- # A show of hands Who here has... -- - Already tried using AI for anything? (e.g. drawing a picture, writing text) -- - Tried using AI specifically for generating analysis code? -- - Who has a paid subscription to an AI bot? -- - Who has thought about it, but didn't know where to start? -- - Who has uploaded data into an AI bot? --- class: inverse, top, center background-image: url("data:image/png;base64,#images/promo_risk_gov.png") background-size: cover # Risks and data governance --- # Risks and data governance Telethon Kids has an 'AI Policy' in draft - not out for review yet. **DO NOT** upload your data to an AI bot: - this is an ethics, governance, and data governance breach - unless the AI resides solely on the server that your data were approved to reside on (within your study approvals) - policy catching up to technology (not just at the Institute) --- # Data Anonymiser app Available now on a TKI (Perth) server, is an app that will: -- - accept your dataset (.csv/.xlsx), and - return a *'similar' but random* dataset, with anonymised column headers. -- The returned dataset (.csv) has: - data that broadly reflects your original data - continuous (and date) data within a similar range - the same number of categories for categorical data (assuming <15) - obscured data for other character variables - obscured column headers -- Your data are destroyed when the app is closed. -- _Soon to be added - an R function that will also do this._ --- class: inverse, top, center background-image: url("data:image/png;base64,#images/promo_anonymizer.png") background-size: cover # Data Anonymiser app - preview --- # Data Anonymiser app - preview <iframe width="100%" height="70%" src="http://tki-hoinf-2403.ichr.uwa.edu.au:3838/shiny_anonymize_app_v2/"></iframe> [http://tki-hoinf-2403.ichr.uwa.edu.au:3838/shiny_anonymize_app_v2/](http://tki-hoinf-2403.ichr.uwa.edu.au:3838/shiny_anonymize_app_v2/) --- class: inverse, top, center background-image: url("data:image/png;base64,#images/promo_mptyt.png") background-size: cover # More powerful than you think --- # Anonymizer app - V1 The first version of the Data Anonymizer app was made *without a single character of code being written by a human*. <iframe width="100%" height="60%" src="http://tki-hoinf-2403.ichr.uwa.edu.au:3838/shiny_anonymize_app_v1/"></iframe> [http://tki-hoinf-2403.ichr.uwa.edu.au:3838/shiny_anonymize_app_v1/](http://tki-hoinf-2403.ichr.uwa.edu.au:3838/shiny_anonymize_app_v1/) --- # Anonymizer app - V1 <img src="data:image/png;base64,#images/app_as_zip.png" width="60%" style="display: block; margin: auto;" /> --- # Anonymizer app - V1 <img src="data:image/png;base64,#images/app_17_iterations.png" width="50%" style="display: block; margin: auto;" /> --- # Anonymizer app - Dummy dataset The data we anonymised in the app was generated by ChatGPT. <img src="data:image/png;base64,#images/synth_req.png" width="70%" style="display: block; margin: auto;" /> --- # Anonymizer app - Dummy dataset It took 3 attempts to get it right. <img src="data:image/png;base64,#images/synth_req_3.png" width="547" height="80%" style="display: block; margin: auto;" /> --- # Anonymizer app - Dummy dataset Did you spot? <img src="data:image/png;base64,#images/synth_req_2.png" width="1131" height="20%" style="display: block; margin: auto;" /> --- # Anonymizer app - Dummy dataset <img src="data:image/png;base64,#2024-05-AI-in-stats_files/figure-html/unnamed-chunk-16-1.svg" style="display: block; margin: auto;" /> <div class="bottom-align-image"> <img src="data:image/png;base64,#images/synth_req_2_1.png" width="80%" style="display: block; margin: auto;" /> </div> --- # Anonymizer app - Dummy dataset <img src="data:image/png;base64,#2024-05-AI-in-stats_files/figure-html/unnamed-chunk-18-1.svg" style="display: block; margin: auto;" /> <div class="bottom-align-image"> <img src="data:image/png;base64,#images/synth_req_2_2.png" width="70%" style="display: block; margin: auto;" /> </div> --- # Anonymizer app - Online graph drawer <img src="data:image/png;base64,#images/synth_plot_2.png" width="80%" style="display: block; margin: auto;" /> --- # Anonymizer app - Online graph drawer <img src="data:image/png;base64,#images/synth_plot_3.png" width="1196" height="95%" /> --- # Anonymizer app - Online graph drawer <img src="data:image/png;base64,#images/synth_plot.png" width="1507" height="100%" /> --- class: inverse, top, center background-image: url("data:image/png;base64,#images/promo_y.png") background-size: cover # Modelling --- # Guiding Model Specification + **Types of decisions:** + What sort of model should be run? + What format should the data be in? + How could any modelling complexities be handled? + How should we interpret or visualise the model's output? + What variable transformations might be necessary for better model fit? -- <br> - **Could ChatGPT be prompted to guide us through this modelling process?** --- # Example Consider simulated lung cancer data from the [UCLA Office of Advanced Research Computing](https://stats.idre.ucla.edu/stat/data/hdp.csv). - Cohort of 8525. - Small set of independent variables. -- <br> **We want to explore:** - How could patient- and physician-level factors relate to lung cancer remission following treatment? --- ## Data - **Outcome** - `remission` (0/1) - **Patient-level variables** - IL6 levels - CRP levels - Length of hospital stay - Family history - Cancer stage - **Physician-level variables** - Physician experience - Physician ID - Hospital ID <img src="data:image/png;base64,#images/hpd_head.png" width="675" height="120%" /> --- ### Data Summary
Characteristic
Remission (N = 8525)
0
, N = 6,004
1
1
, N = 2,521
1
FamilyHx
1,439 (24%)
266 (11%)
CancerStage
    I
1,580 (26%)
978 (39%)
    II
2,357 (39%)
1,052 (42%)
    III
1,301 (22%)
404 (16%)
    IV
766 (13%)
87 (3.5%)
Age
51 (47, 56)
50 (46, 54)
IL6
3.41 (1.97, 5.48)
3.18 (1.83, 5.24)
CRP
4.40 (2.73, 6.68)
4.23 (2.59, 6.47)
LengthofStay
6 (5, 6)
5 (5, 6)
1
n (%); Median (IQR)
--- ## Modelling <img src="data:image/png;base64,#images/log_reg_1_v2.png" width="90%" style="display: block; margin: auto;" /> --- ### Model 1 <img src="data:image/png;base64,#images/log_res_2_v2.png" width="60%" style="display: block; margin: auto;" /> --- ### Model 2 <img src="data:image/png;base64,#images/log_res_3_v2.png" width="60%" style="display: block; margin: auto;" /> --- ### Output <img src="data:image/png;base64,#images/log_reg_mod1_summ.png" width="150%" style="display: block; margin: auto;" /> --- ## Interpretation <img src="data:image/png;base64,#images/log_reg_6.png" width="50%" style="display: block; margin: auto;" /> --- ## Model Diagnostics <img src="data:image/png;base64,#images/hpd_vars.png" width="70%" style="display: block; margin: auto;" /> --- <img src="data:image/png;base64,#images/log_reg_7.png" width="90%" style="display: block; margin: auto;" /> ---## Interpretation: Odds Ratio <img src="data:image/png;base64,#images/log_reg_5.png" width="60%" style="display: block; margin: auto;" /> --- ## Comments - Answers appear generally **informed and accurate**. - Code successfully runs. - Model specification is appropriate. - Range of options for interpreting model output. - Capable of identifying **variable transformations**. -- <br> - Prompts, at face value, rely minimally on jargon. -- <br> - How can we **be confident** in these answers, without being an expert? - Are we asking the **right questions**? --- class: inverse, top, center background-image: url("data:image/png;base64,#images/promo_x.png") background-size: cover # Use and Trust Levels --- # Underutilization and Overreliance -- - Can we afford to ignore AI? -- - Can we afford to ignore humans? -- --- # Can we ignore AI? <img src="data:image/png;base64,#images/jagged_paper.PNG" width="80%" style="display: block; margin: auto;" /> https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4573321 --- # The Jagged Frontier <!-- --> --- # The Jagged Frontier <!-- --> --- # The Jagged Frontier <!-- --> --- # Study Description - Two projects devised, one inside the frontier, the other outside (for ChatGPT 4.0) - Participants (N = 758 strategic consultants) randomised to one of the two projects. - Projects contained 18 sequential tasks of four kinds (creative, analytical, writing, persuasiveness) - Participants further randomised to either "ChatGPT" or "ChatGPT + Overview" <table class="table" style="color: black; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Metric </th> <th style="text-align:right;"> Inside.Frontier </th> <th style="text-align:right;"> Outside.Frontier </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> ChatGPT </td> <td style="text-align:right;"> 189 </td> <td style="text-align:right;"> 190 </td> </tr> <tr> <td style="text-align:left;"> ChatGPT + Overview </td> <td style="text-align:right;"> 190 </td> <td style="text-align:right;"> 189 </td> </tr> </tbody> </table> --- # Methodology - First all participants complete a baseline project without ChatGPT. - Inside/Outside groups complete respective projects. - Speed and quality of work compared between groups against baseline levels. --- # Results <table class="table" style="color: black; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Metric </th> <th style="text-align:left;"> Inside.Frontier </th> </tr> </thead> <tbody> <tr grouplength="2"><td colspan="2" style="border-bottom: 1px solid;"><strong>ChatGPT</strong></td></tr> <tr> <td style="text-align:left;padding-left: 2em;" indentlevel="1"> Speed </td> <td style="text-align:left;"> 28% </td> </tr> <tr> <td style="text-align:left;padding-left: 2em;" indentlevel="1"> Quality </td> <td style="text-align:left;"> 38% </td> </tr> <tr grouplength="2"><td colspan="2" style="border-bottom: 1px solid;"><strong>ChatGPT + Overview</strong></td></tr> <tr> <td style="text-align:left;padding-left: 2em;" indentlevel="1"> Speed </td> <td style="text-align:left;"> 23% </td> </tr> <tr> <td style="text-align:left;padding-left: 2em;" indentlevel="1"> Quality </td> <td style="text-align:left;"> 43% </td> </tr> </tbody> </table> --- # Results <table class="table" style="color: black; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Metric </th> <th style="text-align:left;"> Inside.Frontier </th> <th style="text-align:left;"> Outside.Frontier </th> </tr> </thead> <tbody> <tr grouplength="2"><td colspan="3" style="border-bottom: 1px solid;"><strong>ChatGPT</strong></td></tr> <tr> <td style="text-align:left;padding-left: 2em;" indentlevel="1"> Speed </td> <td style="text-align:left;"> 28% </td> <td style="text-align:left;"> 18% </td> </tr> <tr> <td style="text-align:left;padding-left: 2em;" indentlevel="1"> Quality </td> <td style="text-align:left;"> 38% </td> <td style="text-align:left;"> -13% </td> </tr> <tr grouplength="2"><td colspan="3" style="border-bottom: 1px solid;"><strong>ChatGPT + Overview</strong></td></tr> <tr> <td style="text-align:left;padding-left: 2em;" indentlevel="1"> Speed </td> <td style="text-align:left;"> 23% </td> <td style="text-align:left;"> 30% </td> </tr> <tr> <td style="text-align:left;padding-left: 2em;" indentlevel="1"> Quality </td> <td style="text-align:left;"> 43% </td> <td style="text-align:left;"> -24% </td> </tr> </tbody> </table> -- So clearly we can't ignore the benefits of AI for tasks within the frontier! -- ...but how do we *know* what's inside the frontier? --- # A Hazy Jagged Frontier <!-- --> --- # Can we afford to ignore humans? <img src="data:image/png;base64,#images/biostats_paper.PNG" width="100%" style="display: block; margin: auto;" /> https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10646144/ --- # Results <!-- --> --- # One Example Reproduced The question: Suppose the probability of surviving from a particular disease is 0.9 and there are 20 patients. The number surviving will follow a Binomial distribution with p=0.9 and n=20. What is the probability that no more than 1 patient dies? -- <!-- --> --- # First Attempt <img src="data:image/png;base64,#images/reproduced_q1.PNG" width="742" /> --- # First Attempt <img src="data:image/png;base64,#images/reproduced_q2.PNG" width="712" /> --- # Binomial Distribution <!-- --> --- # Binomial Distribution There are two ways to get the correct answer: ```r size <- 20 # Number of trials prob <- 0.1 # Probability of *dying* pbinom(1, size, prob) ``` ``` ## [1] 0.391747 ``` ```r prob <- 0.9 # Probability of *surviving* 1 - pbinom(18, size, prob) ``` ``` ## [1] 0.391747 ``` --- # Second Attempt - Doubling Down <img src="data:image/png;base64,#images/reproduced_q3.PNG" width="100%" style="display: block; margin: auto;" /> --- # Second Attempt - Doubling Down <img src="data:image/png;base64,#images/reproduced_q4.PNG" width="100%" style="display: block; margin: auto;" /> --- # Third Attempt - Correct? <img src="data:image/png;base64,#images/reproduced_q5.PNG" width="100%" style="display: block; margin: auto;" /> --- # Third Attempt - Correct? <img src="data:image/png;base64,#images/reproduced_q6.PNG" width="100%" style="display: block; margin: auto;" /> --- # What about less 'Math-y' questions? <img src="data:image/png;base64,#images/endodontic_paper.png" width="100%" style="display: block; margin: auto;" /> https://onlinelibrary.wiley.com/doi/full/10.1111/iej.13985 --- ## Methodology - ChatGPT 4.0 asked 60 yes or no questions relating to endodontics (branch of dentistry): 20 easy, 20 medium and 20 hard. -- - Each question asked 60 times using four different accounts at different times of day and across 10 days. -- - Measured the consistency of the 60 answers. -- - Two independent experts answered the questions and compared to ChatGPT so measure accuracy. -- ## Results <table class="table" style="color: black; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Difficulty </th> <th style="text-align:left;"> Consistency </th> <th style="text-align:left;"> Accuracy </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Easy </td> <td style="text-align:left;"> 86.75% </td> <td style="text-align:left;"> 49.25% </td> </tr> <tr> <td style="text-align:left;"> Medium </td> <td style="text-align:left;"> 79.41% </td> <td style="text-align:left;"> 64.58% </td> </tr> <tr> <td style="text-align:left;"> Hard </td> <td style="text-align:left;"> 90.17% </td> <td style="text-align:left;"> 58.17% </td> </tr> <tr> <td style="text-align:left;"> All </td> <td style="text-align:left;"> 85.44% </td> <td style="text-align:left;"> 57.33% </td> </tr> </tbody> </table> --- # The Jagged Frontier <!-- --> --- # A Moving Jagged Frontier - ChatGPT 5.0? <!-- --> --- # When to (and not to) use AI? We asked the question earlier - "How would we know the model specification was correct if we were not an expert?" With that in mind, we suggest two guidelines for using AI effectively: -- - You should be able to verify the AI output is true/accurate with a reasonable and responsible degree of certainty. -- - The time it would take you to verify the AI output should be less than the time it would have taken to get the same output yourself. --- class: inverse, top, center background-image: url("data:image/png;base64,#images/promo_AI_other.png") background-size: cover # AI in other aspects of research --- # AI in other aspects of research There are AI based platforms and apps to assist in many other areas of research, beyond statistical coding and analysis. Not necessarily surprising given their strength is ingesting large volumes of text and forming connections within/between text based content. Some examples: -- - [SciSpace - Typeset](https://typeset.io/) - Paraphrasing and literature review. - [scholarcy - Summerizer](https://www.scholarcy.com/article-summarizer) - File upload (and URL) based summerizer. - [consensus - Balance of evidence](https://consensus.app/) - Tries to indicate whether most papers support or negate a premise. - [Scite - Assistant](https://scite.ai/assistant) - Ask questions, get explanations with citations (and citation previews) - think Introduction. - [Semantic Scholar - Skim reader](https://www.semanticscholar.org/product/semantic-reader) - Highlights and tags content within a paper. --- # Scite - Assistant <img src="data:image/png;base64,#images/other_references.png" width="95%" style="display: block; margin: auto;" /> --- # Semantic Scholar - Skim reader <img src="data:image/png;base64,#images/other_skim.png" width="100%" style="display: block; margin: auto;" /> --- # AI in other aspects of research - Most offer a free demo and/or free account - Most have subscriptions that are relatively cheap ($10-$20 per month) - New/alternate platforms likely to appear regularly - Current platforms may come offline (copyright?) or be acquired and merged/closed --- # Session Recap *Topics:* 1. Risks and data governance 2. More powerful than you think... 3. Modelling 4. Use and trust levels 5. AI in other aspects of your research 6. Discussion --- class: inverse, top, center background-image: url("data:image/png;base64,#images/promo_image.png") background-size: cover # Fin. Discussion. --- # Questions - Can we afford to ignore AI? - How can we know what is inside and outside the 'jagged frontier'? - How do we assess the quality/correctness of AI output? Does this depend on the type of output? - Can we afford to ignore humans? - Are there any guidelines or 'rules of thumb' that could help us decide when to and when not to use AI for a particular task? - What are the implications for GPT 5? Blind trust? More difficulty seeing flaws?